Graphical Models for Recovering Probabilistic and Causal Queries from Missing Data
نویسندگان
چکیده
We address the problem of deciding whether a causal or probabilistic query is estimable from data corrupted by missing entries, given a model of missingness process. We extend the results of Mohan et al. [2013] by presenting more general conditions for recovering probabilistic queries of the form P (y|x) and P (y, x) as well as causal queries of the form P (y|do(x)). We show that causal queries may be recoverable even when the factors in their identifying estimands are not recoverable. Specifically, we derive graphical conditions for recovering causal effects of the form P (y|do(x)) when Y and its missingness mechanism are not d-separable. Finally, we apply our results to problems of attrition and characterize the recovery of causal effects from data corrupted by attrition.
منابع مشابه
A Tutorial on Learning with Bayesian Networks
A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to ...
متن کاملIdentifiability of Model Properties in Over-Parameterized Model Classes
Classical learning theory is based on a tight linkage between hypothesis space (a class of function on a domain X), data space (function-value examples (x, f(x))), and the space of queries for the learned model (predicting function values for new examples x). However, in many learning scenarios the 3-way association between hypotheses, data, and queries can really be much looser. Model classes ...
متن کاملCausality in the Sciences
This paper presents a general theory of causation based on the Structural Causal Model (SCM) described in (Pearl, 2000a). The theory subsumes and unifies current approaches to causation, including graphical, potential outcome, probabilistic, decision analytical, and structural equation models, and provides both a mathematical foundation and a friendly calculus for the analysis of causes and cou...
متن کاملCompiling Relational Database Schemata into Probabilistic Graphical Models
A majority of scientific and commercial data is stored in relational databases. Probabilistic models over such datasets would allow probabilistic queries, error checking, and inference of missing values, but to this day machine learning expertise is required to construct accurate models. Fortunately, current probabilistic programming tools ease the task of constructing such models [1, 2, 3, 4, ...
متن کاملApplication of Bayesian networks to two classification problems in bioinformatics
The application of machine learning techniques to bioinformatics problems has become increasingly popular in recent years. Of particular interest are probabilistic graphical models since they provide a concise representation for inferring models from data. Current applications include the learning of gene regulatory networks (Friedman, 2004) and protein function prediction. Bayesian networks ar...
متن کامل